Enriched Search Features Based In Part On Discovering People-Centric Search Intent

ABSTRACT

A search environment of an embodiment includes name mining and matching features used in part to identify people-centric queries and provide an enriched search experience, but is not so limited. A method of an embodiment operates to provide an expanded query based in part on a geometric similarity measure, an edit distance measure, a string similarity measure, and a cumulative similarity measure. A search system of an embodiment includes a mined candidate generator component and a name matcher component used in part to identify name queries and provide an expanded query that includes original query terms and one or more valid mined names. Other embodiments are also disclosed.

BACKGROUND

Search engines enable users with a tool that can be used to locaterelevant information such as documents, web sites, and other files usingkeyword inputs. A different search paradigm, sometimes referred to as a“people search,” “person finder,” or “people locator,” has emerged as adifferent type of search service. A people searching paradigm is focusedon people; whereas general web search and enterprise search toolstypically encompass a wide range of topics including people, products,news, events, etc. Personal name inputs, including misspelled andomitted names, tend to be the predominant queries submitted in a peoplesearch domain which may employ large authoritative name directorieshaving names in the tens of thousands or millions. Spelling errors inpersonal names are of a different nature as compared to those in generaltext. Thus, to ensure a desirable user experience and promote return ofsearch service users, correcting misspelled personal names plays anoteworthy role in reducing the time and effort required by users tofind people they are searching for.

Some search systems rely on a correct entry of a person's exact namewhen a user searches over a broad search space and are typically notconfigured to yield any profile information of a person who is thesubject of a search. As an example, one search solution constraint mayrequire a user to explicitly navigate to a “People” bar or site tosearch for a person's profile resulting solely in a ranked collection ofprofiles that contain the exact name terms, with no additionalinformation provided. A user needs to perform additional steps,including executing subsequent queries, to extract any additionalinformation beyond any profile information. For example, a user wouldneed to perform additional searching operation to query and fetch anydocuments authored by a top ranked profile. Such limited searchcapabilities provide overly constrained search results and lackpromotion of user confidence in the search system or service.

SUMMARY

This summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended asan aid in determining the scope of the claimed subject matter.

Embodiments are provided that include the use of name mining andmatching features in part to identify people-centric queries and providean enriched search experience, but are not so limited. In an embodiment,a method operates to provide an expanded query based in part on ageometric similarity measure, an edit distance measure, a stringsimilarity measure, and a cumulative similarity measure. A search systemof an embodiment includes a mined candidate generator component and aname matcher component used in part to identify name queries and providean expanded query that includes original query terms and one or morevalid mined names. Other embodiments are also disclosed.

These and other features and advantages will be apparent from a readingof the following detailed description and a review of the associateddrawings. It is to be understood that both the foregoing generaldescription and the following detailed description are explanatory onlyand are not restrictive of the invention as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an exemplary search environment.

FIG. 2 is a flow diagram illustrating an exemplary process of providingexpanded query features.

FIG. 3 is a block diagram of an exemplary people-centric search system.

FIG. 4 is a flow diagram illustrating an exemplary process of providingmined candidate and name matching features.

FIG. 5 is a block diagram illustrating an exemplary computingenvironment for implementation of various embodiments described herein.

DETAILED DESCRIPTION

As described below, embodiments encompass intelligent search featurescapable of determining whether any given search query is people-centricor person-centric, but are not so limited. A people-centric query can bedescribed as a search query where an intent or focus of a particularquery is to retrieve information associated with an individual orindividuals. Embodiments can be used to process queries that are not anexact name of a person including discovering intent from misspelledand/or omitted queries without having to navigate to any particularpeople-search interface or application. As discussed below, anembodiment uses character bigrams rather that a phonetic representationin part to provide people-centric query features.

Embodiments provide people-centric query determination and expansionfeatures that include providing and/or using an altered, expanded,and/or corrected query when a people-centric query is identified, butthe embodiments are not so limited. For example, a search engine can usea people-centric query determination algorithm to provide transformedqueries resulting in part from identification and/or correction of namespelling errors and/or other input errors. In an embodiment, componentsof a search service can use mined name candidates and a multi-level nameconstraint set to generate valid personal names that can be used asquery expanders or transformers.

Various embodiments can also provide customizable actions if apeople-centric query is identified to thereby enrich the user's searchexperience and promote further use of the people-centric search service.For example, a search experience can be enhanced by performingcustomizable actions when a people-centric query is identified to notonly return a person's record associated with the people-centric searchbut also return top documents authored by the person, curriculum vitaeinformation, social network contact information, v-card information,profile information from any number of social networks, and/or renderthe result differently or with additional identifying information (e.g.,provide a photo of a searched for person), etc. A user can adjust howcustomized actions return information and when. Users can opt-out ofhaving personal and other information from being returned or used by thepeople-centric search service.

In an embodiment, as part of discovering or identifying people-centricqueries, if any given search query likely contains a name of a person,including misspelled and omitted names, a search service can operate todetermine that the given query is a people-centric query and the intentof the query is to extract a person's record, profile, and/or otherinformation associated with the person subject to the query. Forexample, a search service can operate using a name mining and matchingalgorithm to identify valid names based in part on a misspelled ormistaken name query input. Accordingly, a query input need not be anexact or a correct name of a person and various embodiments candistinguish such inputs as name queries including correcting anymisspellings and generating expanded or transformed queries using anoriginal input and one or more valid name expanders, as described below.A search service of an embodiment operates in part to discoverpeople-name or people-centric queries from universal search queries inan enterprise or other setting.

In an embodiment, a searching interface can use people-centric querydetermination features in part to identify a searcher's intent includingfocusing search input to name queries and providing altered, expanded,and/or transformed queries that include one or more personal namesincluding corrected name inputs. The people-centric query determinationfeatures can be used to correct a misspelled query input and return themost likely name based on how close the original query is to a correctedname or names provided based in part on name mining and matchingfeatures. A searching interface of one embodiment operates inconjunction with a search server to mine personal names and providepotential candidates that can be validated and/or invalidated, whereinvalidated candidates can be used as query expanding terms or names.Expanded queries can be automatically executed and relevant resultsprovided or suggested to the user as a query suggestion for affirmativeinput by a user. For example, a transformed name query can be presentedto a user as a suggested refinement to an existing search and run onlyif the user clicked a link or otherwise affirmatively selects to use thename query suggestion.

FIG. 1 is a block diagram of an exemplary search environment 100 thatincludes processing, memory, and other components that provide queryintent determination and expansion features as part of a searchingoperation, but is not so limited. As shown in FIG. 1, the environment100 includes a search server 102 including a search engine 104configured with people-centric query determination and processingfeatures, a query expansion or transformation component 106, an inputcomponent 108 or other parsing component(s), and/orprocessing/memory/communication/networking/application resources 110,but is not so limited. In addition to features described herein, thefunctionality of the search server 102 or other component(s) can includeindexing and data structure population and/or maintenance services, webcontent management, enterprise content services, enterprise search,shared business processes, business intelligence services, and/or otherfeatures.

The input component 108 is configured in part to tokenize or otherwiseparse an input query string into constituent parts, such as one or moreoriginal query terms or tokens for example. Correspondingly, and asdescribed further below, the query expansion component 106 can provideexpanded queries based in part on a tokenized input string. In oneembodiment, the input component 108 includes a tokenizer that can beincluded and used locally with a client. In another embodiment, thetokenizer or other parsing component can be included with server 102 orshared therebetween. It will be appreciated that different methods oftokenization, regular expression, and other parsing and/or stringrecognition features can be used based in part on an input languageused.

As an example, the input component 108 can be used to tokenize portionsof a received query using a word breaker component according to theinput query language. For example, a word breaker algorithm can beimplemented that operates to parse query inputs based in part onoccurrences of white space, punctuation, and/or other parsing keys.Different word breakers can be used according to the input languageand/or preferred result language. A pattern matching algorithm such as aregular expression that does not rely on the input string being brokeninto segments can also be used. Or, the word breaking can be part of theregular expression when the regular expression includes punctuationand/or whitespace or other delimiting characters. It should be notedthat other textual matching technology include literal string matching,natural language parsing, and other information processing techniquescan be utilized in accordance with a particular implementation.

As shown for the exemplary environment 100 of FIG. 1, a number ofexemplary components are communicatively coupled to the search server102, including a smartphone client 112, a laptop client 114, and/or adesktop client 116. Each client can use a search interface (local orweb-based) to submit queries to and receive personal name identificationand other search results from the search server 102. For example, a usercan use a search interface to input characters, words, etc., such aspersonal names or pseudonyms for example, which can be parsed and usedin part to recognize a user intent to search for individuals based on apeople-centric query determination algorithm, described further below.

As an example, a user interface, such as a handheld browser or searchwindow can be used to receive typed, inked, stylus, verbal, and/or otheraffirmative user inputs and the query expansion component 106 canoperate to provide personal name query expanders or expanding terms thatinclude one or more personal names that included corrected inputs. Asone example, the environment 100 can include searching and indexingfeatures used in conjunction with at least one corpus of informationincluding name directory and other information. A corpus of informationcan be representative of local, Intranet, Internet, and/or othernetworked information repositories. The corpus of information can beindexed and searched over when mining for name candidates. In oneembodiment, a list of names can be imported from a directory service aspart of mining name candidates. While a limited number of clients areshown, it will be appreciated that the search server 102 can serve anynumber of clients.

Components of the environment 100 can be used as part of searching oneor more indexed data structures for relevant information associated witha user query. It will be appreciated that the search server 102 can useone or more search indexes, such as inverted and/or other index datastructures for example, to provide and/or use an expanded query. Forexample, an inverted index can be built for each name directory, whereineach associated name can be broken into constituent tokens to form a setof distinct name tokens, using the name tokens and original names.

As described briefly above, the query expansion component 106 canoperate to provide expanded queries that provide further focus to anoriginal query input based in part on recognition of search intentcorresponding to a personal or other name search. The query expansioncomponent 106 can operate to provide a set of candidates to allow thesearch engine 104 to retrieve multiple possible names the user might belooking for, or a single most likely candidate, which may be viewed as aspelling correction, rather than expansion. The spelling correction canbe used for high confidence misspellings and improve quality of theresults by reducing the recall of likely irrelevant results and therebyfocusing the user input on a particular candidate. The query expansioncomponent 106 can provide one or more expanded query terms or stringsthat can be used by the search engine 104 to provide search results to aquerying user. The query expansion component 106 can use one or moreoriginal input terms or tokens as part of a query expansion, alteration,or transformation operation.

With continuing reference to FIG. 1, as described above, components ofthe environment 100 can be used in part to discover the intent of asearch query and use this information to enrich the end-user searchexperience. For example, components of the environment 100 can be usedas part of identifying people-centric search queries and providingexpanded name queries such as by narrowing a query to a single bestcandidate based on some confidence threshold for example. Components ofthe environment 100 can operate to distinguish a query between apersonal name and a query unrelated to a personal name as part ofdiscovering if a searcher's intent is person-centric or people-centric.For example, an assumption can be used that if a given query contains aperson's name then the user is trying to discover information about aperson, and hence the search intent is determined to be people-centric.

Any given search query can be analyzed for validating as apeople-centric query, including discovering name queries as a searchintention from misspelled and other erroneous inputs. Components canoperate to not only detect the people name query intent, but alsocorrect an original query to a most likely name. A rewritten or expandedquery can be used to precisely return the people data for a correctedname, providing for better user search experience. For example, anoriginal query input can be determined to be a name misspelling inputand rewritten and then executed as part of a search interface limited toproviding a limited number of people records (e.g., 1 or 2). Theresulting people records can also be interleaved and/or otherwisepresented with general web results when a name is detected from theoriginal query with high probability. For example, an input query can beidentified as a misspelled name query, corrected to a correct namequery, and used as part of an query correction and/or rewritingoperation.

In an embodiment, subsequent to any query pre-processing operations, thesearch server 102 can operate to first mine a set of similar candidatesor nearest neighbor names to a given query, and then determine if anysimilar candidates satisfy a set of pre-defined and/or configurableconstraint thresholds. If a mined candidate satisfies all the thresholdsof a level, then the search server 102 can identify a given query as aname query. If the search server 102 discovers the intent of a query aspeople-centric then, in addition to returning information associatedwith a person of interest, such as a person's record for example,certain customizable actions can be triggered. For example, thecustomizable actions can be used to render search results in differentways, return authored documents, fetch profile information fromcorporate and/or public social networks, etc. Correspondingly, inaddition to returning correct person information from misspelled andomitted queries, the search server 102 can also attempt to determine auser's subsequent queries (intent) apriori and thereby enrich an overallsearch experience, promoting a rich and user-friendly searchenvironment.

The exemplary environment 100 of one embodiment includes a networkedand/or local name directory component or components having a collectionof unique names of employees or other individuals associated with anenterprise or other organization or network(s). When a user enters aquery, the search engine 104 can operate to mine names from the namedirectory component that are nearest neighbors or otherwise correspondwith a certain similarity measure, also referred to as structuralsimilarity measure. The search engine 104 can operate to evaluatestructural similarities between mined potential candidates and anoriginal query to determine if people-centric search intent is valid.

The search engine 104 of an embodiment operates to distinguish anoriginal query input between a name query and a non-name query bypassing original query terms or tokens through a flow of people-centricdetermination filter nodes. A people-centric determination filter nodecan be used to determine if the query meets specific requirements and,if so, allows the query to proceed to a subsequent people-centricdetermination filter node. An original query can be classified oridentified as a people-centric or name query if the query passes throughenabled people-centric determination filter nodes.

In one embodiment, the search engine 104 uses a number of people-centricdetermination nodes or stages that include a number of active filtersand/or passive filters. Since a user can enter a misspelled name query,an active filter can be used in part to mine for potential namecandidates from a name directory or other information repository toaccount for such inputs. The active filter can be used to mine potentialcandidates that are most closely related to an original query inputusing a similarity or some other quantifying measure. The active filterof an embodiment uses structural similarity features in part to minenearest neighbors to the given query as candidates.

In an embodiment, as part of mining a voluminous name directory ordirectories, a mined candidate stage performs, for each token of anoriginal query, an approximate nearest neighbor search of name tokens toproduce a list of candidate matches, such as directory tokens that areapproximate matches of or structurally similar to an original querytoken for example. It will be appreciated that a token can be defined asa word of a personal name comprising a continuous string of charactersconsistent with the types of characters employed in personal names inthe language of the name. Using candidate tokens, candidate names can beextracted which contain at least one of the approximate matching tokens.If no candidates are mined, the active filter operations end and/or amessage can be returned to the user of the exiting operation, includingredirection to a different search interface or result.

A hashing procedure to hash personal name tokens, query tokens, and/orother tokens can be implemented according to a desired outcome. In anembodiment, a data-driven learning hash function technique provides formapping similar names to similar binary codewords based on a set ofpersonal names in a given language (e.g., monolingual data). In oneembodiment, learning certain hash functions for mapping similar names tosimilar binary codewords can be based in part on use of name equivalentsor other measures in multiple languages. The language of an equivalentpersonal name can also be in a different script from the otherequivalent names. For example, in a two-language implementation, namepairs can be used as training data including anticipated names in thelanguage and script and/or an equivalent name in a different language.

For example, given a personal name query that has been broken up intoits constituent tokens Q=S₁S₂ . . . S₁, each token S_(i) is hashed intoa codeword y_(i) using an appropriate previously learned hash function(e.g., a hash function learned from using monolingual training names, ora hash function learned for the language of the query when multilingualtraining names are employed). For each of the resulting query codewordsy_(i), those codewords y_(i)′ in the previously built directory indexthat are at a prescribed distance (e.g., Hamming distance) of r or lessfrom y_(i) are identified. For example, a Hamming distance of 4 can beused. The name tokens corresponding to each of the identified codewordsare then retrieved from the index and ranked. In one implementation,this ranking involves the use of a token-level similarity scoringprocedure.

In one embodiment, token-level similarity scoring includes the use of alogistic function applied over multiple distance measures to compute asimilarity score between name tokens S from the query and S′ of the nametokens corresponding to the identified codewords retrieved from anassociated index. For example, this token-level similarity scoringfunction can be defined as:

${K\left( {s,s^{\prime}} \right)} = {\frac{1}{1 + ^{- {\sum\limits_{i}{\alpha_{i}{d_{i}{({s,s^{\prime}})}}}}}}.}$

Where K(S,S′) is the token-level similarity score between S and S′,d_(i) is the i^(th) stance measure and a_(i) is a weighting factor forthe i^(th) distance measure.

Unlike active filters, passive filters of an embodiment determine if anoriginal input query meets specific constraints and, if so, then theinput query proceeds to a name matching determination node. Passivefilters can include restrictive filters and/or adaptive filters.Restrictive filters can be configured with fixed constraints, but arenot so limited. In case a query input fails to meet an associatedconstraint, the search process ends or can be redirected to anothercomponent or process.

A name determination node of an embodiment can include the use of anadaptive filter to process mined name candidates. For example, a namematching process can be used as part of identifying a best match, or upto a prescribed number (e.g., 10) of the top scoring matches, between apersonal name query and the candidate personal names taken from acandidate pool. The query and personal names in the candidate pool maytypically have multiple name parts (i.e., multiple words or tokensmaking up the personal name). Thus, a measure of similarity between thefull personal name in the query and each of the full candidate names inthe candidate pool can be computed using the individual token-levelsimilarity scores computed for each token associated with both the queryand the names in the candidate pool or set.

In an embodiment, a multi-token name similarity measure can be computedas follows. First, let Q=S₁S₂ . . . S₁ and D=S₁′S₂′ . . . S_(J)′ be twomulti-token names, where, Q corresponds to a personal name query, and Dcorresponds to one of the candidate personal names from a candidatepool. To compute the similarity between Q and D, a weighted bipartitegraph is formed with a node for each S_(i) and a node for each S_(j)′,and with an edge weight between each node being set to a previouslycomputed token-level similarity measure K (S_(i),S_(j)′). The weight(k_(max)) of the maximum weighted matching is computed. This maximumweighted matching represents the greatest possible sum of the individualedge weights following a node-to-node path through the graph. A maximalmatching computed using a greedy approach can be used in someimplementations since many of the edges in the bipartite graph may havea low weight.

Thus, a similarity between Q and D can be computed as:

${K\left( {Q,D} \right)} = {\frac{\kappa_{\max}}{{{I - J}} + 1}.}$

Where K(Q,D) is the similarity score between the personal name query Qand D a candidate personal name D,I is the number of tokens in thepersonal name query Q and J is the number of tokens in the candidatepersonal name D.

An adaptive filter of one embodiment includes two sets of the same typeof constraints used in part to process mined candidates and/or otherquery input. If a query fails to meet any constraint of either set, itis removed from further consideration as a name query. In case a querypasses all the constraints in either level, it can be classified as avalid personal name and output as a query expander. In an embodiment,queries can be pre-processed before performing a name matching process.For example, a query cleaning process can include removing certaincharacter terms like parenthesis, quotes, numbers, and other non-nameportions from a query before performing constraint level determinations.Such as query cleaning process can be implemented at some other time aswell, such as part of processing the original query input.

As described above, the search engine 104 can use a number of processingnodes or filters to validate an original query as part of classifying asa valid name query. In one embodiment, the search engine 104 uses arestrictive filter that detects specific character tokens not generallyfound in names. For example, specific character tokens (e.g., ‘#’, ‘@’,‘!’, [0-9], etc.) can be characterized as noise, and an associated querycan be removed from further consideration in the name querydetermination and/or expansion process.

A second restrictive filter can be used as part of filtering out minimaltoken queries. For example, queries having less than a certain number ofquery tokens provided as a result of a wordbreaking or other parsingservice can be filtered out of the name matching process. For example, aminimal token query can be prevented from proceeding when the number ofquery tokens falls below a predefined threshold (e.g., two (2)). Suchminimal token queries can be classified as not being name queries andremoved from further consideration in the process. The search engine 104of an embodiment uses an active filter having two (2) mining phases tomine for potential name candidates, but is not so limited.

In the first mining phase, nearest neighbors are mined for each nameterm from a name directory or other store. For each query, the searchengine 104 mines a list of nearest neighbors.

Thus,

-   -   Query: {token A}-{nearest neighbor 1, nearest neighbor 2, . . .        , nearest neighbor N} through {token N}-{nearest neighbor 1,        nearest neighbor 2, . . . , nearest neighbor N}.

Each nearest neighbor can be associated with a similarity score todetermine how close a potential candidate is to an original query.Potential candidates can also be pruned that do not satisfy any of anumber of pruning conditions including, but not limited to:

a) A potential candidate includes a minimum number of valid charactertokens (e.g., at least two (2)).

b) A distance measured between the potential candidate and the originalquery is below a predefined threshold (e.g., four (4)). The distancemeasure of an embodiment can be defined as the length offset.

c) A similarity threshold associated with the potential candidate isabove a defined threshold (e.g., >=90%).

After pruning the potential candidates, remaining candidates proceed tothe second mining phase. If no potential candidates pass through afterthe pruning operations, then the flow does not proceed to the secondmining phase. In the second mining phase, all candidate names (e.g.,complete names (first name, last name)) are selected from a namedirectory or other repository that includes one or more of the potentialcandidate names. Since every name term can be associated with asimilarity score, averages can be used to flatten the scores andcalculate a geometric distance average score for each mined candidate.As part of the second mining phase, the search engine 104 operates toeliminate all such candidate names that have fewer terms than theoriginal query.

The search engine 104 also uses an adaptive filter that uses the outputof the candidate mining process to evaluate a structural similaritybetween mined names and an original query. The adaptive filter of anembodiment uses two name determination levels, wherein each namedetermination level includes a fixed set of constraint types, each sethaving different threshold values. Valid name results can be identifiedfrom the mined names that satisfy each constraint of one of the namedetermination levels. Correspondingly, if one or more mined potentialcandidates satisfy all of the constraints in a level, then the query isconsidered as a valid name query and used as part of providing analtered, expanded query, or transformed query. In an embodiment, theadaptive filter constraint set includes a geometric distance measure, anedit distance measure, a string similarity measure, and/or a cumulativesimilarity measure collectively used to identify one or more first andlast names to be included as part of an expanded query formulation.

The geometric distance measure of an embodiment includes the use of acanonical geometric score associated with a mined name candidate and anoriginal query. For example, the canonical geometric score can include alevel 1 threshold of 99% and a level 0 threshold of 95%.

The edit distance measure of an embodiment includes the use of aLevenshtein edit distance score associated with each mined namecandidate and the original query. The Levenshtein edit distance is ameasure of similarity between two (2) strings (e.g., source and targetstrings) accounting for a number of deletions, insertions, orsubstitutions required to transform the source string to the targetstring. The higher the edit distance, the less similarity between two(2) strings. For example, a factor of four (4) can be used as part ofthe edit distance score using a scoring function f(n)=1/(1+n) where n isthe number of deletion, insertions, or substitutions required totransform source to destination. For example, the edit distance scorecan include a level 1 threshold of 0.4 and a level 0 threshold of 0.5.

The string similarity measure of an embodiment includes the use of aJaro-Winkler string similarity measure or distance score associated witha mined name candidate and the original query. The higher theJaro-Winkler distance, the more similar two strings are. For example, afactor of four (4) can be used as part of the Jaro-Winkler distancescore using a scoring function f(n)=1/(2−n). For example, theJaro-Winkler distance score can include a level 1 threshold of 2.95 anda level 0 threshold of 3.0.

The cumulative similarity measure of an embodiment is based on thegeometric score, the edit distance score, and the string similarityscore. For example, the cumulative similarity score can include a level1 threshold of 5.0 and a level 0 threshold of 5.75.

If any mined name candidate satisfies all constraints of either level,then each such mined name candidate can be classified as a valid namequery expander (e.g., first and last name). As such, the original querycan be classified as a valid name query and/or transformed to includeany name query expander output as part of an expanded name query. Anexpanded name query, including corrected name inputs used as expanderterms, can be executed by the search engine 104 to locate informationassociated with a searched over name or names and/or performing anyother customizable actions (e.g., display a photo, display authoreddocuments or links, pull a profile, provide a v-card, etc.).

The search engine 104 of an embodiment operates to automatically executean expanded name query against a name directory or other informationrepository, returning relevant information associated with a person orpersons of interest. The search engine 104 can also preemptively provideadditional information associated with a person of interest, such as aprofile, picture, authored documents, etc. The search engine 104 can useany number of relevancy algorithms as part of returning search resultsincluding links associated with files, documents, web pages, filecontent, virtual content, web-based content, etc. For example, thesearch engine 104 can use text, property information, and/or metadatawhen returning relevant search results associated with local files,remotely networked files, combinations of local and remote files, otherdata structures, etc.

The functionality described herein can be used by or part of anoperating system (OS), file system, web-based system, or other searchingsystem, but is not so limited. The functionality can also be provided asan added component or feature and used by a host system or otherapplication. In one embodiment, the environment 100 can becommunicatively coupled to a file system, virtual web, network, and/orother information sources as part of providing searching features. Anexemplary computing system that provides query expansion and searchingfeatures includes suitable programming means for operating in accordancewith a method of providing mined name information and/or search results.

Suitable programming means include any means for directing a computersystem or device to execute steps of a method, including for example,systems comprised of processing units and arithmetic-logic circuitscoupled to computer memory, which systems have the capability of storingin computer memory, which computer memory includes electronic circuitsconfigured to store data and program instructions. An exemplary computerprogram product is useable with any suitable data processing system.While a certain number and types of components are described herein, itwill be appreciated that other numbers and/or types and/orconfigurations can be included according to various embodiments.Accordingly, component functionality can be further divided and/orcombined with other component functionalities according to desiredimplementations.

FIG. 2 is a flow diagram illustrating an exemplary process 200 ofproviding expanded query features including mining for valid namecandidates and providing expanded name queries based in part on originalquery inputs, but is not so limited. At 202, the process 200 operates toprocess an original query input as part of determining people-centricquery intent. For example, the process 200 at 202 can operate topre-process the original query input including parsing operations and/orremoving any invalid characters and/or identifying non-name queries. At204, the process 200 determines if the original query input is a namequery. If the original query input is not a name query then the flowreturns to 202 and the process 200 waits for a new input.

If the process 200 at 204 determines that the original query input is aname query, then the process 200 at 206 of an embodiment operates toprovide any mined name candidates using a similarity determination. Forexample, the process 200 can use a networked search server to quantifysimilarity determinations made as part of identifying nearest neighborfirst and last name candidates to an original first and last name queryinput. If no mined candidates are provided at 206, the process 200returns to 202 and waits for a new input.

At 208, the process 200 uses a name matching determination to identifyany valid names from the mined name candidates. For example, the process200 at 208 can operate to identify valid first and last names based inpart on a number of constraints and associated threshold levels tovalidate mined name candidates. At 210, the process 200 operates toprovide an expanded query including the original query input and one ormore validated first and last names.

It will be appreciated that processing and/or networking features canassist in providing real-time name searching and mining features. Theprocess 200 of an embodiment can also operate to automatically executean expanded query without any user input other than the original query.Aspects of the process 200 can be distributed to and among othercomponents of a computing architecture, and client, server, and otherexamples and embodiments are not intended to limit features describedherein. While a certain number and order of operations is described forthe exemplary flow of FIG. 2, it will be appreciated that other numbersand/or orders can be used according to desired implementations.

FIG. 3 is a functional block diagram of an exemplary people-centricsearch system 300 that includes functionality to provide personal namemining, matching, and other searching features as part of a namematching process. While a client is not shown, it will be appreciatedthat many types of computing devices/systems and searching interfacescan use features of the people-centric search system 300. For example, auser can submit a search query including one or more query terms using asmartphone interface, laptop computer interface, tablet interface,desktop interface, or other computer/communication interface as part ofmining for name candidates associated with a people-centric input query.

As shown, the exemplary people-centric search system 300 includes, aninvalid query detector component 302, a minimum query tokens detectorcomponent 304, a mined candidate generator component 306, a name matchercomponent 308, and a query expansion component 310, but is not solimited. The query expansion component 310 operates in part to use anoriginal query input and a number of valid name candidates to generatean expanded query to use as part of a searching operation.

The invalid query detector component 302 is configured to filter invalidqueries based in part on assessing a number of original query tokens.For example, if a query string contains non-name terms, the invalidquery detector component 302 operates to output the original querystring for use in a general search interface or system and prevents theoriginal query from proceeding further in the name matching process. Theminimum query tokens detector component 304 is configured to filterqueries having a number of query tokens that is less than a definedquery token threshold. For example, if a query string contains minimaltoken numbers, the minimum query tokens detector component 304 operatesto output the original query string for use in a general searchinterface or system and prevents the original query from proceedingfurther in the name matching process.

The mined candidate generator component 306 operates in part to minename candidates from a name directory and/or other repository that havea certain degree or measure of similarity to the original query. Themined candidate generator component 306 can output one or more minedname candidates as an input to the name matcher component 308. If nomined name candidates pass through the mined candidate generatorcomponent 306, the original query string can be output for use in thegeneral search interface or system and prevented from proceeding on tothe name matcher component 308.

If mined name candidates pass through the mined candidate generatorcomponent 306, the original query string along with any mined namecandidates are output to the name matcher component 308. The namematcher component 308 operates to perform a similarity assessmentbetween the original query and any mined name candidates output from themined candidate generator component 306. The name matcher component 308is configured to output one or more valid personal names, including namecorrections, as an input to the query expansion component 310. However,if no valid personal names are output from the name matcher component308, the original query string is output for use in the general searchinterface or system, and the people-centric search system 300 waits forfurther input. While a number of components and features are described,other embodiments are included and configurable.

FIG. 4 is a flow diagram illustrating an exemplary process 400 ofproviding name mining and matching features, but is not so limited. Forexample, the process 400 can be used to mine name candidates from aninformation repository and use a name matching algorithm to output oneor more valid names. At 402, the process 400 receives an original querystring. For example, an end-user can use a search interface of a clientapplication to submit an original search query as part of performing asearch that the process 400 can recognize as focusing on a person ofinterest.

At 404, the process 400 operates to identify dirty or non-name queriesbased in part on the original query string. If the process 400identifies the original query string as dirty, the flow proceeds to 406wherein an organic search interface or process can be used to processthe original query string. For example, the process 400 can operate at404 to locate tokens that are deemed non-name tokens. If the originalquery string is not identified as dirty at 404, then the process 400 at408 determines if the original query string has a defined minimum numberof query tokens. If the process 400 identifies that the number oforiginal query string tokens are insufficient at 408, then flow proceedsto 406, wherein the organic search interface or process can be used toprocess the original query string.

If the process 400 identifies that the number of original query stringtokens is sufficient at 408, then flow proceeds to 410 and the process400 operates to generate mined name candidates. For example, the process400 can use a similarity determination between the original input tokensto identify nearest neighbor name candidates based in part on asimilarity measure associated with names of a directory component. Ifthe process 400 does not identify any mined name candidates at 410, theflow proceeds to 406 wherein an organic search interface or process canbe used to process the original query string.

If the process 400 identified any mined name candidates at 412, the flowproceeds to 414. If each mined name candidate satisfies all of the level1 constraints at 414, the flow proceeds to 416 and the process 400operates to alter the original query to include the original querystring and one or more valid names corresponding to one or more minedname candidates that satisfied all level 1 constraints. If each minedname candidate does not satisfy at least one level 1 constraint at 414,the flow proceeds to 418. If each mined name candidate satisfies all ofthe level 0 constraints at 418, the flow proceeds to 416 and the process400 operates to alter the original query to include the original querystring and one or more valid names corresponding to one or more minedname candidates that satisfied all level 0 constraints. In oneembodiment, the process 400 uses the same set of constraint types forlevel 0 and level 1, albeit with different threshold values. Forexample, a constraint set can include a geometric similarity or distancemeasure, an edit distance measure, a string similarity measure, and acumulative measure to process mined name candidates, wherein differentconstraint level values can be implemented to stress an amount ofimportance associated with each measure.

With continuing reference to FIG. 4, if each mined name candidate doesnot satisfy at least one level 0 constraint at 418, then the flowproceeds to 406 wherein the organic search interface or process can beused to process the original query string. An altered query includingone or more valid name expanders can be automatically executed as partof a search engine operation to provide relevant search results. In oneembodiment, the process 400 can operate to correct an original queryinput and use the corrected name as an altered, new, or rewritten query.Additionally, the process 400 can include functionality to perform oneor more customizable actions, such as linking social networkinginformation and/or providing a short biography with returned nameresults as examples. The process 400 can be used as part of apeople-centric search and is not intended to be limited to anyparticular type of search corpus.

It will be appreciated that processing and/or networking features canassist in providing real-time searching and expertise mining features.Aspects of the process 400 can be distributed to and among othercomponents of a computing architecture, and other examples andembodiments are not intended to limit features described herein. While acertain number and order of operations is described for the exemplaryflow of FIG. 4, it will be appreciated that other numbers and/or orderscan be used according to desired implementations.

An illustrative example of name mining and matching features isdescribed below. Assume for this example that a search service uses aname mining and matching algorithm in part to determine people-centricquery intent, correct misspelled name inputs, and/or provide expandedname queries using an original query input.

Consider the following queries:

Query A: “Windows Phone 7”

Query B: “Sarah Sinofsky Blog Article”

Query C: “Dillilah Mayorson”

Query D: “Town Hall”

Query E: “workflow”

Assume that the queries are based on inputs to a search interface of asmartphone or other handheld device, laptop, desktop, tablet, etc.

As described below, a number of processing nodes are used to process aquery as part of classification as a name query. An invalid querydetector node can be configured as a restrictive filter to identifyspecific character tokens of an original query input that are dirty. Ifsuch a token is identified, the original query input does not proceed tothe next node. Exemplary dirty character tokens include tokens that arenot generally found in names, such as ‘#’, ‘@’, ‘!’, [0-9], etc.

Result of invalid query detector node:

Query A will be deemed dirty and will not continue to subsequent nodes.However, queries B-E pass through the invalid query detector node.

A minimum query tokens detector node can be configured as a restrictivefilter to verify a sufficient number of tokens included as part of theoriginal query input. For example, a parsing service can parse anoriginal query input into term tokens. For this example, as part ofmining personal name candidates, if the number of original tokens fallsbelow a predefined threshold (e.g., two (2)), then the original queryinput is not considered as a name query and does not to proceed to thenext processing node.

Result of minimum query tokens detector node:

Query B: Word broken into {Sarah, Sinofsky, Blog, Article}-size 4

Query C: Word broken into {Dillilah, Mayorson}-size 2

Query D: Word broken into {Town, Hall}-size 2

Query E: Word broken into {workflow}-size 1-Does not meet minimumthreshold and is marked invalid.

A mined candidate generator node can be configured as an active filterhaving two phases. In the first phase, nearest neighbors are mined foreach name term using a name directory. That is, for each query term,mine a list of nearest neighbors.

For Query C of this example:

Original Token Nearest neighbors Query C: {Dillilah} {Dililah, Dilila,Dellilah} {Mayorson} {Meyerzon, Michelson, Myerson, Myer}

Each nearest neighbor is based in part on a similarity score whichdetermines how close a candidate is to the original query input.

Similarity scores, shown as percentages, for Query C are:

-   -   {Dillilah}-{Dililah (93%), Dilila (95%), Dellilah(99%)}    -   {Mayorson}-{Meyerzon (97%), Mayerzon(98%), Michelson (89%),        Myerson (98%), Myer (90%)}

A pruning node can be used to prune candidates that do not satisfy anyof the following three conditions:

1) Includes a minimum number of valid character tokens (e.g., at least2).

2) The distance between the name and the original query is equal to orbelow a predefined threshold (e.g., four (4)). The distance here is thelength offset.

(Thus: Mined candidate “Myer” is eliminated as the distance between“Mayorson” and “Myer” is 4).

And, 3) Similarity score is above a pre-defined threshold (e.g., >=90%)

Thus: Mined candidate “Michelson” is eliminated as the similaritythreshold is <90%.

After pruning the list of mined candidates, any remaining candidatesproceed to the next node. If no candidates remain after pruning, theprocessing ceases.

For this example, the following mined candidates proceed:

-   -   {Dillilah}-{Dililah (93%), Dilila (95%), Dellilah(99%)}    -   {Mayorson}-{Meyerzon (97%), Myerson (98%)}

In the second phase, name mining and matching features are used toselect all names (complete names (e.g., first, last)) from a namedirectory that have one or more of the mined candidate terms.

For this example, mined names from the name directory include:

{Dililah Meyerzon, Dilila Mayerzon, Chadd Myerson, Dellilah Petruic}

As part of the second phase, the similarity scores for each mined nameare averaged, such that:

Mined Name 1: Dililah Meyerzon (93+97)/2=95%

Mined Name 2: Dilila Mayerzon (95+98)/2=96.5%

Mined Name 3: Chadd Myerson (0+98)/2=49%

Mined Name 4: Dellilah Petruic (99+0)/2=49.5%

The second phase also operates to eliminate all such mined names thatinclude fewer terms than in the corresponding original query input. Forexample, if “Sarah Sinofsky” was a mined name for Query B (“SarahSinofsky Blog Article”), since the number of terms in the mined name isless than the original query (i.e., 2<4), the mined name “SarahSinofsky” would be removed from further consideration. If no potentialname candidates are mined, then the name matching process stops or exitsto another process.

The name matching process of an embodiment includes the use of a namematcher node configured as a passive adaptive filter to process allvalid mined name candidates based in part on a structural similaritymeasure associated with the mined name candidates and the original queryinput. The name matcher node of an embodiment includes two (2) levels orphases comprising the same set of constraints but using differentthreshold values to quantify different name matching features. If one ormore of the mined candidates satisfy all the constraints in a level,then the original query input is considered as a valid name query.

Exemplary level constraints include, but are not limited to:

1) A canonical geometric score or mined confidence factor between eachmined name candidate and the original query input. For example, a level1 threshold of 99% and a level 0 threshold of 95% can be used asdetermination thresholds for the geometric distance constraint.

2) A Levenshtein edit distance score between each mined name candidateand the original query input that defines a measure of similaritybetween strings. For example, the edit distance between the source andtarget strings is the number of deletions, insertions, and/orsubstitutions required to transform the source string to the targetstring. The greater the edit distance, the less similarity between thestrings. For this example, a factor of 4 is used for the edit distancescore using a r scoring function of f(n)=1/(1+n) where n is the numberof deletion, insertions, and/or substitutions required to transformsource to destination. For example, a level 1 threshold of 0.4 and alevel 0 threshold of 0.5 can be used as determination thresholds for theedit distance constraint.

3) A Jaro-Winkler distance score between each mined name candidate andthe original query input. The greater the Jaro-Winkler distance score,the more similar the strings. For this example, a factor of 4 is usedfor the Jaro-Winkler distance score using a scoring function isf(n)=1/(2−n). For example, a level 1 threshold of 2.95 and a level 0threshold of 3.0 can be used as determination thresholds for theJaro-Winkler constraint.

4) A cumulative similarity score between each mined name candidate andthe original query comprising a function defined in part by thegeometric score, Levenshtein edit distance score, and the Jaro-Winklerdistance score.

For example, the cumulative similarity score can be calculated as[(((GeometricSimilarityFactor*minedCandidate.Confidence*minedCandidateLength))+editSimilarity)+jaroWinkerSimilarity].For example, a level 1 threshold of 5.75 and a level 0 threshold of 5.0can be used as determination thresholds for the cumulative similarityconstraint and a geometric similarity factor or measure of 1.0.

As described above, according to an embodiment, to pass-through as avalid name and used as part of an expanded or transformed query, a minedname candidate has to satisfy all four constraints using either level 1or level 0 thresholds.

For the current example:

Original query input: Dillilah Mayorson

Mined Name 1: Dililah Meyerzon (Geom Score=95%, mined candidatelength=2, Lev. Score=0.5, JW Score=3.13)

Mined Name 2: Dilila Mayerzon (Geom Score=96.5%, Lev. Score=0.52, JWScore=3.26)

Mined Name 3: Chadd Myerson (Geom Score 49%)

Mined Name 4: Dellilah Petruic (Geom Score 49.5%)

Using compiled data for the first and last name candidates, the namematching algorithm determines that:

1) None of the mined name candidates meet the Level 1 geometric distancethreshold of 99%, so the Level 0 thresholds are then considered for thefull first and last name candidates. Only Dililah Meyerzon and DililaMayerzon satisfy the Level 0 geometric distance threshold of 95%.

2) Dililah Meyerzon has a geometric distance of 0.95, Levenshtein editdistance score of 0.5, Jaro-Winkler distance score of 3.13, andcumulative similarity score of 5.53[(0.95*2)+0.5+3.13], and thereforesatisfies all of the level 0 thresholds and is identified as a validname query. Dilila Mayerzon is also a considered as valid name query bysatisfying all of the level 0 constraints. Accordingly, Query C isidentified as a name query associated with a people-centric searchintent. Using the valid name queries, the original query input (Query C)can be altered or transformed to (Dililah AND Meyerzon) OR (Dilila ANDMayerzon) and used by the search service to provide relevant searchresults associated with the person-centric query intent. It is notedthat the transformed query includes the valid combined first and lastnames for each mined candidate that satisfied the name matchingdetermination.

An exemplary name matcher can be encoded as:

Procedure: Name Matcher Begin Procedure: For (level: 1 to 0 ) Begin For:For(minedCandidates: 1 to MinedCandidateCollection.Size) Begin For:geometricDistance =DistanceThreshold [level] [Geometric] editDistance =DistanceThreshold [level] [Edit] jaroWinklerDistance = DistanceThreshold[level] [JaroWinkler] cumulativeDistance = DistanceThreshold [level][Cumulative] If (minedCandidate.GeometricScore < geometricDistance)//Candidates are sorted by geometric score. No subsequent candidatesatisfies the minimum //threshold condition. This is not a name query.BREAK If (minedCandidate.EditDistanceScore < editDistance) //Thiscandidate does not meet the edit distance threshold, move to the nextcandidate CONTINUE If (minedCandidate.JaroWinklerDistanceScore <jaroWinklerDistance) //This candidate does not meet the jaro distancethreshold, move to the next candidate CONTINUE If(minedCandidate.CumulativeDistanceScore < cumulativeDistance) //Thiscandidate does not meet the cumulative distance threshold, move to thenext candidate CONTINUE //Satisfies all the constraints, this is a validname query. Add it to the bucket ValidNameBucket.Add(minedCandidate) EndFor If (ValidNameBucket.NotEmpty) //No need to move to the next level ifthere are already high confidence results in the top level BREAK End ForIf (ValidNameBucketNotEmpty) //Sort the results and return the topresult ValidNameBucket.Sort Return NameQuery = TRUE Else ReturnNameQuery = FALSE

While certain embodiments are described herein, other embodiments areavailable, and the described embodiments should not be used to limit theclaims. Exemplary communication environments for the various embodimentscan include the use of secure networks, unsecure networks, hybridnetworks, and/or some other network or combination of networks. By wayof example, and not limitation, the environment can include wired mediasuch as a wired network or direct-wired connection, and/or wirelessmedia such as acoustic, radio frequency (RF), infrared, and/or otherwired and/or wireless media and components. In addition to computingsystems, devices, etc., various embodiments can be implemented as acomputer process (e.g., a method), an article of manufacture, such as acomputer program product or computer readable media, computer readablestorage medium, and/or as part of various communication architectures.

The term computer readable media as used herein may include computerstorage media. Computer storage media may include volatile andnonvolatile, removable and non-removable media implemented in any methodor technology for storage of information, such as computer readableinstructions, data structures, program modules, or other data. Systemmemory, removable storage, and non-removable storage are all computerstorage media examples (i.e., memory storage.). Computer storage mediamay include, but is not limited to, RAM, ROM, electrically erasableread-only memory (EEPROM), flash memory or other memory technology,CD-ROM, digital versatile disks (DVD) or other optical storage, magneticcassettes, magnetic tape, magnetic disk storage or other magneticstorage devices, or any other medium which can be used to storeinformation and which can be accessed by a computing device. Any suchcomputer storage media may be part of device.

The embodiments and examples described herein are not intended to belimiting and other embodiments are available. Moreover, the componentsdescribed above can be implemented as part of networked, distributed,and/or other computer-implemented environment. The components cancommunicate via a wired, wireless, and/or a combination of communicationnetworks. Network components and/or couplings between components of caninclude any of a type, number, and/or combination of networks and thecorresponding network components include, but are not limited to, widearea networks (WANs), local area networks (LANs), metropolitan areanetworks (MANs), proprietary networks, backend networks, etc.

Client computing devices/systems and servers can be any type and/orcombination of processor-based devices or systems. Additionally, serverfunctionality can include many components and include other servers.Components of the computing environments described in the singular tensemay include multiple instances of such components. While certainembodiments include software implementations, they are not so limitedand encompass hardware, or mixed hardware/software solutions. Otherembodiments and configurations are available.

Exemplary Operating Environment

Referring now to FIG. 5, the following discussion is intended to providea brief, general description of a suitable computing environment inwhich embodiments of the invention may be implemented. While theinvention will be described in the general context of program modulesthat execute in conjunction with program modules that run on anoperating system on a personal computer, those skilled in the art willrecognize that the invention may also be implemented in combination withother types of computer systems and program modules.

Generally, program modules include routines, programs, components, datastructures, and other types of structures that perform particular tasksor implement particular abstract data types. Moreover, those skilled inthe art will appreciate that the invention may be practiced with othercomputer system configurations, including hand-held devices,multiprocessor systems, microprocessor-based or programmable consumerelectronics, minicomputers, mainframe computers, and the like. Theinvention may also be practiced in distributed computing environmentswhere tasks are performed by remote processing devices that are linkedthrough a communications network. In a distributed computingenvironment, program modules may be located in both local and remotememory storage devices.

Referring now to FIG. 5, an illustrative operating environment forembodiments of the invention will be described. As shown in FIG. 5,computer 2 comprises a general purpose server, desktop, laptop,handheld, or other type of computer capable of executing one or moreapplication programs. The computer 2 includes at least one centralprocessing unit 8 (“CPU”), a system memory 12, including a random accessmemory 18 (“RAM”) and a read-only memory (“ROM”) 20, and a system bus 10that couples the memory to the CPU 8. A basic input/output systemcontaining the basic routines that help to transfer information betweenelements within the computer, such as during startup, is stored in theROM 20. The computer 2 further includes a mass storage device 14 forstoring an operating system 24, application programs, and other programmodules.

The mass storage device 14 is connected to the CPU 8 through a massstorage controller (not shown) connected to the bus 10. The mass storagedevice 14 and its associated computer-readable media providenon-volatile storage for the computer 2. Although the description ofcomputer-readable media contained herein refers to a mass storagedevice, such as a hard disk or CD-ROM drive, it should be appreciated bythose skilled in the art that computer-readable media can be anyavailable media that can be accessed or utilized by the computer 2.

By way of example, and not limitation, computer-readable media maycomprise computer storage media and communication media. Computerstorage media includes volatile and non-volatile, removable andnon-removable media implemented in any method or technology for storageof information such as computer-readable instructions, data structures,program modules or other data. Computer storage media includes, but isnot limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solidstate memory technology, CD-ROM, digital versatile disks (“DVD”), orother optical storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices, or any other medium which canbe used to store the desired information and which can be accessed bythe computer 2.

According to various embodiments of the invention, the computer 2 mayoperate in a networked environment using logical connections to remotecomputers through a network 4, such as a local network, the Internet,etc. for example. The computer 2 may connect to the network 4 through anetwork interface unit 16 connected to the bus 10. It should beappreciated that the network interface unit 16 may also be utilized toconnect to other types of networks and remote computing systems. Thecomputer 2 may also include an input/output controller 22 for receivingand processing input from a number of other devices, including akeyboard, mouse, etc. (not shown). Similarly, an input/output controller22 may provide output to a display screen, a printer, or other type ofoutput device.

As mentioned briefly above, a number of program modules and data filesmay be stored in the mass storage device 14 and RAM 18 of the computer2, including an operating system 24 suitable for controlling theoperation of a networked personal computer, such as the WINDOWSoperating systems from MICROSOFT CORPORATION of Redmond, Wash. The massstorage device 14 and RAM 18 may also store one or more program modules.In particular, the mass storage device 14 and the RAM 18 may storeapplication programs, such as word processing, spreadsheet, drawing,e-mail, and other applications and/or program modules, etc.

It should be appreciated that various embodiments of the presentinvention can be implemented (1) as a sequence of computer implementedacts or program modules running on a computing system and/or (2) asinterconnected machine logic circuits or circuit modules within thecomputing system. The implementation is a matter of choice dependent onthe performance requirements of the computing system implementing theinvention. Accordingly, logical operations including related algorithmscan be referred to variously as operations, structural devices, acts ormodules. It will be recognized by one skilled in the art that theseoperations, structural devices, acts and modules may be implemented insoftware, firmware, special purpose digital logic, and any combinationthereof without deviating from the spirit and scope of the presentinvention as recited within the claims set forth herein.

Although the invention has been described in connection with variousexemplary embodiments, those of ordinary skill in the art willunderstand that many modifications can be made thereto within the scopeof the claims that follow. Accordingly, it is not intended that thescope of the invention in any way be limited by the above description,but instead be determined entirely by reference to the claims thatfollow.

1. A method comprising: generating a number of mined candidates based inpart on using a learned hash function and valid pass-through of anoriginal query input that includes determining if the original queryinput satisfies a first restrictive filter stage and a secondrestrictive filter stage; using an adaptive filter comprising amulti-level name matcher constraint set that includes first thresholdlevels and second threshold levels; and generating an expanded querythat includes the original query input and any valid name correspondingto any mined candidate that satisfies all of the first threshold levelsor all of the second threshold levels associated with the multi-levelconstraint set.
 2. The method of claim 1, further comprising generatingthe expanded query using the multi-level name matcher constraint setincluding a geometric measure, an edit distance measure, a stringsimilarity measure, and a cumulative similarity measure.
 3. The methodof claim 2, further comprising generating the expanded query using themulti-level name matcher constraint set, wherein the geometric measurecomprises a canonical geometric score, the edit distance measurecomprises a Levenshtein edit distance score, the string similaritymeasure comprises a Jaro-Winkler distance score, and the cumulativesimilarity measure comprises a cumulative distance score based in parton the canonical geometric score, the Levenshtein edit distance score,and the Jaro-Winkler distance score.
 4. The method of claim 1, furthercomprising automatically executing the expanded query as part ofdiscovering a searching intent to be people-centric including usingcustomized actions to enrich a search experience.
 5. The method of claim4, further comprising automatically executing the expanded querycontaining a candidate expansion having a highest confidence measureaccording to a specified threshold to provide one or more person recordsincluding using the customized actions to perform one or more ofrendering search results in a different way, returning a number ofauthored documents or other items, and fetching profile information fromsocial and other networks.
 6. The method of claim 1, further comprisingusing a first threshold value and a second threshold value as part offirst and second geometric measure assessments, using a first thresholdvalue and a second threshold value as part of first and second editdistance measure assessments, using a first threshold value and a secondthreshold value as part of first and second string similarity measureassessments, and using a first threshold value and a second thresholdvalue as part of first and second cumulative similarity measureassessments, including returning one or more valid personal names to beused as part of the expanded query if one or more mined candidatessatisfy all of the first threshold level values or all of the secondthreshold level values.
 7. The method of claim 6, further comprisingusing the second threshold values once one of the first threshold levelvalues is not satisfied.
 8. The method of claim 5, further comprisingassigning different threshold level values based in part on importanceof one or more of geometric similarity constraints and structuralsimilarity constraints.
 9. The method of claim 1, further comprisingusing the similarity score and one or more mined name candidates thatare nearest neighbors to original query tokens, wherein a similarityscore is calculated for each nearest neighbor in part to determinerelatedness between each mined first and last name candidate and theoriginal query tokens.
 10. The method of claim 9, further comprisingmining a list of nearest neighbors for each query term of the originalquery input and determining if each mined nearest neighbor satisfies asimilarity threshold value.
 11. The method of claim 1, furthercomprising generating the number of mined candidates using the firstrestrictive filter stage comprising an invalid query detector filter andthe second restrictive filter stage comprising a minimum query tokensdetector filter, and using an active mining filter having a number ofphases to further process potential candidates.
 12. The method of claim1, further comprising exiting to an organic search service if the numberof mined candidates is zero or if no mined candidate satisfies allthreshold levels of one of the first or second constrain sets of theadaptive filter.
 13. A search system comprising: an invalid querydetector component configured to filter invalid queries based in part ona number of original query tokens; a minimum query tokens detectorcomponent configured to filter out query inputs having a defined numberof query tokens that are less than a defined query token threshold; amined candidate generator component configured to generate mined namecandidates based in part on the number of original query tokens and asimilarity measure; a name matcher component configured to generate anumber of valid names based in part on an output from the minedcandidate generator and a plurality of threshold values associated witha first threshold determination stage and a second thresholddetermination stage; and a query expander component configured toprovide expanded queries based in part on one or more original queryterms and one or more valid names.
 14. The search system of claim 13,the mined candidate generator to generate mined name candidates from aname repository based in part on a number of original query terms and asimilarity measure threshold value including a selection of all firstand last names from a name directory component that include one or moremined terms.
 15. The search system of claim 14, the mined candidategenerator to eliminate mined personal names that have fewer terms thanthe number of original query terms.
 16. The search system of claim 13,the name matcher component to generate the number of valid names basedin part on an output from the mined candidate generator and a geometricmeasure, an edit distance measure, a string similarity measure, and acumulative similarity measure.
 17. The search system of claim 13, thename matcher component to generate the number of valid names based inpart on an output from the mined candidate generator and a canonicalgeometric score, a Levenshtein edit distance score, a Jaro-Winklerdistance score, and a cumulative similarity score.
 18. Computer storage,including instructions which, when executed, operate to: use an originalquery to mine personal names including using at least one restrictivefilter and a similarity measure; generate altered query terms comprisingpersonal names using an adaptive filter including a geometric measure,an edit distance measure, a string similarity measure, and a cumulativesimilarity measure; and provide an altered query using one or more validpersonal names having satisfied at least one level of the adaptivefilter.
 19. The computer storage of claim 18, including instructionswhich, when executed, operate to automatically execute the altered queryagainst a name directory.
 20. The computer storage of claim 18,including instructions which, when executed, operate to use the alteredquery to provide information associated with one or more individualsincluding authored materials or contact information associated with asocial network.